Skip to content

fix: e2e-triage and e2e-fix workflows#757

Draft
alishakawaguchi wants to merge 14 commits intomainfrom
e2e-triage-fix
Draft

fix: e2e-triage and e2e-fix workflows#757
alishakawaguchi wants to merge 14 commits intomainfrom
e2e-triage-fix

Conversation

@alishakawaguchi
Copy link
Contributor

@alishakawaguchi alishakawaguchi commented Mar 23, 2026

Summary

  • claude-code-action@v1 does not install project plugins, so /e2e:triage-ci and /e2e:implement slash commands were silently not resolved
  • The triage step completed in 21ms with $0 API cost — the model was never called, producing empty triage.md output
  • Replaced slash commands with explicit "Read and follow .claude/skills/e2e/..." instructions that use the Read tool (already in allowedTools) to load skill files directly

Test plan

  • Re-trigger the E2E Triage workflow against the same failing run to confirm Claude produces non-empty triage output
  • Verify triage artifact contains actual findings (not empty)
  • Verify plan artifact contains an actual implementation plan

🤖 Generated with Claude Code


Note

Low Risk
Low risk workflow-only change, but it affects automated triage/plan generation prompts and could alter or break CI triage output if the referenced skill files change or the instructions are misinterpreted.

Overview
The E2E triage workflow now removes /e2e:triage-ci and /e2e:implement slash-command prompts and instead tells claude-code-action@v1 to Read and follow the procedures in .claude/skills/e2e/triage-ci.md and .claude/skills/e2e/implement.md.

It also makes the prompts explicitly pass the artifact path or run URL, agent, and SHA, and clarifies that CI artifact analysis should skip local re-run steps when a local artifact path is provided.

Written by Cursor Bugbot for commit 2ba0a47. Configure here.

…iage workflow

claude-code-action@v1 does not install project plugins, so /e2e:triage-ci
and /e2e:implement slash commands were not resolved. The triage step completed
in 21ms with $0 API cost — the model was never called, producing empty output.

Replace slash commands with explicit "Read and follow" instructions that use
the Read tool (already in allowedTools) to load skill files directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 77de372b0fce
Copilot AI review requested due to automatic review settings March 23, 2026 19:22
@alishakawaguchi alishakawaguchi self-assigned this Mar 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the E2E triage GitHub Actions workflow prompts to avoid relying on unresolved Claude Code project slash commands, by instructing the model to Read and follow the repo’s .claude/skills/e2e/* procedures directly.

Changes:

  • Replaces /e2e:triage-ci ... usage with explicit instructions to read .claude/skills/e2e/triage-ci.md, including passing inputs (artifact path / run URL, agent, SHA).
  • Replaces /e2e:implement ... usage with instructions to read .claude/skills/e2e/implement.md and then read the generated triage.md.

Comment on lines +214 to +220
prompt: |
/e2e:triage-ci ${{ env.RUN_URL }} --agent ${{ matrix.agent }} --sha ${{ needs.matrix-setup.outputs.sha }}
Read and follow the full E2E triage procedure from .claude/skills/e2e/triage-ci.md.

Inputs:
- CI run URL: ${{ env.RUN_URL }}
- Agent: ${{ matrix.agent }}
- SHA: ${{ needs.matrix-setup.outputs.sha }}
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rerun mode (Run triage (with re-runs)), the prompt provides a CI run URL as an input. The referenced procedure (.claude/skills/e2e/triage-ci.md, Step L1) explicitly says that when a CI run reference is provided, it should download artifacts and skip Steps L2–L5 (local re-runs). That means this rerun job path is likely to never execute the intended local re-runs, despite the extra setup/timeouts. Consider updating the prompt (or the skill) so rerun mode still performs Steps L2–L5 after identifying failing tests (e.g., use the run URL only to discover failures, then run local re-runs for verification).

Copilot uses AI. Check for mistakes.
Comment on lines +266 to +268
Read and follow the fix implementation procedure from .claude/skills/e2e/implement.md.

Read the triage findings at ${{ github.workspace }}/e2e-triage-artifacts/${{ matrix.agent }}/triage.md for agent ${{ matrix.agent }}.
Copy link

Copilot AI Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan step prompt points to .claude/skills/e2e/implement.md, which mandates entering plan mode via /plan and later requires running real E2E tests before summary. In this workflow step you're only trying to generate a plan artifact; these instructions can push the agent to attempt expensive/long-running E2E executions or rely on /plan being recognized. Consider adding an explicit constraint in the prompt to only produce the implementation plan (use the EnterPlanMode tool if needed) and stop before applying changes or running tests.

Suggested change
Read and follow the fix implementation procedure from .claude/skills/e2e/implement.md.
Read the triage findings at ${{ github.workspace }}/e2e-triage-artifacts/${{ matrix.agent }}/triage.md for agent ${{ matrix.agent }}.
You are running in a planning-only workflow step. In this step you MUST NOT run or trigger any real E2E tests, shell commands, or code changes, and you MUST ONLY produce a written implementation plan.
1. Read and use only the planning/implementation design guidance from .claude/skills/e2e/implement.md. If that document instructs you to enter plan mode via `/plan` or to run tests before summarizing, treat those as requirements for a future implementation step, not for this step. Do not actually run tests, apply changes, or rely on `/plan` being recognized here.
2. Read the triage findings at ${{ github.workspace }}/e2e-triage-artifacts/${{ matrix.agent }}/triage.md for agent ${{ matrix.agent }}.
3. Based on these inputs, write a concise, step-by-step implementation plan describing how a human or a later workflow should fix the identified issues.
Your output must be plain text only and must NOT include `/plan` commands, tool invocations, or instructions that this very agent run should execute tests or modify code. Stop after writing the plan.

Copilot uses AI. Check for mistakes.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Inputs:
- CI run URL: ${{ env.RUN_URL }}
- Agent: ${{ matrix.agent }}
- SHA: ${{ needs.matrix-setup.outputs.sha }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-run path missing override to prevent skipping local tests

Medium Severity

The re-run triage prompt provides a CI run URL input, but triage-ci.md explicitly instructs the model to "skip Steps L2-L5 (local re-runs)" when a CI run reference is given. The analysis-only path correctly includes a redundant skip instruction, but the re-run path omits the opposite override — telling the model to actually execute L2-L5. This means the model will likely skip local re-runs, making the entire re-run path (agent CLI installation, bootstrap, tmux, Bash tool access) useless.

Fix in Cursor Fix in Web

alishakawaguchi and others added 13 commits March 23, 2026 12:34
claude-code-action's OIDC token exchange requires the workflow file to
match the default branch, preventing testing on feature branches. Pass
github_token directly to bypass this restriction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: d73c731f6abc
Merge the two claude-code-action invocations (triage + plan) into a
single prompt that runs in plan mode. Claude writes both triage findings
and fix plan to the plan file, which is then extracted and split into
triage.md and plan.md artifacts.

Benefits:
- Single invocation reduces cost (~$0.60 vs $0.86)
- Plan mode gives structured reasoning for fix plans
- Plan content captured from file (fixes empty plan artifact)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 9a30a28f697c
- Set display_report: false to remove tool call noise from summary tab
- Add custom job summary step that shows triage + plan markdown
- Remove redundant "Upload plan artifact" (triage upload has both files)
- Copy execution.json to artifacts for debugging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 609909ae979f
…tion, log Slack errors

- Collapse per-agent matrix into single triage job so Claude can correlate
  failures across agents and find shared root causes
- Add "triage starting" Slack notification in setup job before triage begins
- Log actual Slack API error field in post-slack-message.sh (was silently
  swallowing the error, making failures impossible to diagnose)
- Update e2e-fix workflow to download single unified triage artifact
- URL-encode failed_agents in Fix It URL since it may contain commas

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 74562887fe7f
…te it

The "Post triage starting" step was silently skipped because
env.SLACK_BOT_TOKEN was only set at the step level, but GitHub Actions
evaluates if: conditions before step env is applied.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 585bce40d0d2
Slack thread_ts must be in dot-decimal format (e.g., "1482960137.003543").
The dot gets stripped somewhere in the dispatch pipeline (URL encoding or
GitHub Actions numeric coercion), causing Slack to reject with
invalid_thread_ts. Re-insert the dot assuming 6 decimal places.

Ref: https://api.slack.com/messaging/retrieving

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0009b566bbbc
- Simplify "triage started" message: remove agent list, link to triage run
- Merge "triage completion" and "fix plan" steps into single "triage result"
- Use Block Kit with green "Fix It" button instead of plain text link
- Remove raw plan/markdown dump from Slack thread
- Add --payload flag to post-slack-message.sh for Block Kit support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: be8091cb744b
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 6ce3944ef17d
- Add thread_ts dot normalization (same fix as triage workflow)
- Simplify "fix started" message with emoji, link to fix run
- Success: broadcast ":review: E2E fix applied: <PR_URL>" to channel
  using reply_broadcast + unfurl_links
- Failure: ":x: E2E fix failed" with run link (thread only)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: dc9d1a373abd
After applying fixes, the workflow now installs agent CLIs and runs
the actual failing E2E tests twice per agent to confirm the fix works.
If verification fails, Claude Code gets one more attempt with the
failure output as context, then tests run again.

PR is only created after E2E verification passes, and Slack messages
now report verification status (attempt count, pass/fail).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8eae20f22eab
The fix workflow now only requires triage_run_id — run_url and
failed_agents are auto-detected from metadata.json in the triage
artifacts. Explicit inputs still take precedence for backward
compatibility with the Slack "Fix It" button.

The triage workflow now writes metadata.json (run_url, failed_agents,
sha) alongside its existing plan/triage artifacts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: ce6d58f81698
gh run download expects a numeric run ID but the Cloudflare Worker
or manual trigger may pass a full GitHub Actions URL. Extract the
numeric ID from the URL before calling gh run download.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 0c1ad5d2c425
The action validates that the workflow file matches the default branch.
Passing github_token allows it to authenticate and run on feature
branches.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 887ffbfe48ff
@alishakawaguchi alishakawaguchi changed the title fix: replace slash commands with explicit Read instructions in e2e-triage workflow fix: e2e-triage and e2e-fix workflows Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants